Double the trouble: handling noise and reverberation in far-field automatic speech recognition

نویسندگان

  • David Gelbart
  • Nelson Morgan
چکیده

Far-field microphone speech signals cause high error rates for automatic speech recognition systems, due to room reverberation and lower signal-to-noise ratios. We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) microphone in a conference room, in comparison with recordings from close-talking microphones. In an earlier paper, we showed improvements in far-field speech recognition performance using a longterm log spectral subtraction method to combat reverberation. This method is based on a principle similar to cepstral mean subtraction but uses a much longer analysis window (e.g., 1 s) in order to deal with reverberation. Here we show that a combination of short-term noise filtering and longterm log spectral subtraction can further reduce recognition word error rates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Double the Trouble: Handling Noise and Automatic Speech Rec

Far-field microphone speech signals cause high error rates for automatic speech recognition systems, due to room reverberation and lower signal-to-noise ratios. We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) microphone in a conference room, in comparison with recordings from close-talking microphones. In an earlier paper, we showed impr...

متن کامل

Reverberation Suppression Based on Sparse Linear Prediction in Noisy Environments

We present a single channel method for late reverberation suppression. The proposed approach estimates late reverberation as a linear combination of previous time-frequency frames. We impose a sparsity constraint on the predictor in order to select the most relevant signal frames for the estimation. The dataset used for the evaluation is corrupted by background noise, thus we propose to jointly...

متن کامل

Techniques for handling convolutional distortion with 'missing data' automatic speech recognition

In this study we describe two techniques for handling convolutional distortion with ‘missing data’ speech recognition using spectral features. The missing data approach to automatic speech recognition (ASR) is motivated by a model of human speech perception, and involves the modification of a hidden Markov model (HMM) classifier to deal with missing or unreliable features. Although the missing ...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

Feature mapping using far-field microphones for distant speech recognition

Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of dista...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002